Using metaquant: A package to Estimate Means, Standard deviations and Visualising distributions using Quantiles

Udara Kumaranathunga, Alysha M De Livera and Luke Prendergast

2025-02-03

Introduction

The metaquant package provides functions for estimating means, standard deviations, and visualising distributions using quantile summary data. This package is designed focusing on meta-analyses with continuous outcomes, particularly when only quantile-based information (e.g., medians, quartiles, or extremes) is available in the studies being analyzed. By using flexible quantile-based distributions, metaquant enables researchers to handle quantile summary measures to support the implementation of a comprehensive meta-analysis.

The package deals with three common scenarios of reported quantile data, each with the sample size:

where { \(a\), \(q_1\), \(m\), \(q_3\), \(b\) } denote the sample minimum, first quartile, the median, third quartile and the maximum, respectively.

For the cases with all 5 number summary data is available (S\(_3\)), the package uses the Generalized Lambda Distribution (GLD), a flexible family of distributions capable of approximating many common distributions (e.g., normal, logistic, and log-normal). Specifically, GLD using the FKML parameterisation (Freimer et al., 1988), which is defined by its quantile function is used in the package.

When only 3-point summaries are available, such as in the Scenario 1 and 2 above (S\(_1\) and S\(_2\)), the package uses the quantile-based skew logistic distribution (SLD) (van Staden and King, 2015). Bases on these density based approaches (GLD and SLD), as well as other existing methods, the package provides functions to estimate summary measures such as sample means and standard deviations using the quantile summaries and sample sizes.

In addition to the estimation of sample means and standard deviations, metaquant includes functions for visualising the estimated distributions of the sample data, covering all 3 scenarios above. The visualisation functions allow users to create density plots using only quantile summaries, enabling exploration of group differences, skewness, and heterogeneity across studies.

For more details on the methodology related to metaquant, refer to De Livera et al. (2024).

References

Alysha De Livera, Luke Prendergast, and Udara Kumaranathunga. A novel density-based approach for estimating unknown means, distribution visualisations, and meta-analyses of quantiles. Submitted for Review, 2024. (Article available on request to authors.)

Marshall Freimer, Georgia Kollia, Govind S. Mudholkar, and C. Thomas Lin. A study of the generalized Tukey lambda family. Communications in Statistics-Theory and Methods, 17(10):3547–3567, 1988.

P. J. van Staden and R. A. R. King. The quantile-based skew logistic distribution. Statistics & Probability Letters, 96:109–116, 2015.

Installing metaquant

metaquant can be download via CRAN as follows:

#Using CRAN
#install.packages("metaquant")
library(metaquant)

Alternatively, the development version can be downloaded using GitHub. To install this version, the user needs to make sure that Rtools has been installed and integrated prior.

#install.packages("devtools")
#library(devtools)
#devtools::install_github("metaanalysisR/metaquant")

Estimate Summary Statistics using Quantiles

Estimate Mean

The function ‘est.mean’ estimates the sample mean of a study that presents one of the scenarios of quantile summaries. (i.e., either one of 3-point summaries (S\(_1\) or S\(_2\)) or 5-point summaries (S\(_3\)).

The ‘est.mean’ implements a flexible quantile-based distribution methods for estimating sample means proposed by De Livera et al. (2024) as well as some existing methods for estimating sample means as described by Luo et al. (2018) and McGrath et al. (2020).

The estimation methods implemented in the function are the following:

Estimating mean using S\(_3\): { \(a\), \(q_1\), \(m\), \(q_3\), \(b\) }

To illustrate the usage of the function, we first generate example 5-point summary data using ‘rlnorm’ function in the ‘stats’ package.

# Load the libraries
library(metaquant)
library(stats)
#Generate quantile summary data
set.seed(123)
n <- 100
x <- rlnorm(n, 4, 0.3)
quants <- c(min(x), quantile(x, probs = c(0.25, 0.5, 0.75)), max(x))
quants
##                 25%       50%       75%           
##  27.30990  47.07984  55.61930  67.19152 105.23542

Next, assuming ‘quants’ represents a 5-point summary from a study where we need to estimate the sample mean, the default ‘gld/sld’ method is used for the estimation.

#Estimate sample mean of S3 using 'gld/sld'
estmean_gl <- est.mean(min = quants[1], 
                       q1 = quants[2], 
                       med = quants[3],
                       q3 = quants[4],
                       max = quants[5],
                       n=n)
estmean_gl
## $mean
## [1] 57.59711

If one needs to estimate the sample mean using a defined alternative method, simply specify the method as follows.

#Estimate sample mean of S3 using the method 'luo'
estmean_luo <- est.mean(min = quants[1], 
                        q1 = quants[2], 
                        med = quants[3],
                        q3 = quants[4],
                        max = quants[5],
                        n=n,
                        method = "luo")
estmean_luo
## $mean
## [1] 57.28699

Estimating mean using S\(_1\): { \(a\), \(m\), \(b\) }

Suppose only the minimum, median, and maximum are available as your quantile summaries.

# 3-point summary data for S1
quants1 <- c(min(x), quantile(x, probs = 0.5), max(x))
quants1
##               50%          
##  27.3099  55.6193 105.2354

Then, instead of providing all five quantile inputs to the function, you can use the arguments ‘min’, ‘med’, and ‘max’ along with the sample size ‘n’ to estimate the sample mean. To estimate the sample mean of S\(_1\) using ‘gld/sld’ method,

#Estimate sample mean for S1
estmean_sl_1 <- est.mean(min = quants1[1], 
                        med = quants1[2],
                        max = quants1[3],
                        n=n,
                        method = "gld/sld")
estmean_sl_1
## $mean
## [1] 57.28843

Similarly, the ‘method’ argument can be adjusted to use a different estimation method, as described above.

Estimating mean using S\(_2\): { \(q_1\), \(m\), \(q_3\) }

Similarly, if only the first quartile, median, and third quartile are available, use the ‘q\(_1\)’, ‘med’ and ‘q\(_3\)’ arguments of the function.

# 3-point summary data for S2
quants2 <- quantile(x, probs = c(0.25, 0.5, 0.75))
quants2
##      25%      50%      75% 
## 47.07984 55.61930 67.19152
#Estimate sample mean for S2
estmean_sl_2 <- est.mean(q1 = quants2[1], 
                        med = quants2[2],
                        q3 = quants2[3],
                        method = "gld/sld")
estmean_sl_2
## $mean
## [1] 58.85415

Note that, the method ‘gld/sld’ under S\(_2\) does not require the sample size to estimate the sample mean, so the argument ‘n’ can be omitted in this case. However, all other methods require the sample size for the estimation.

Estimate Standard Deviation

For completeness, the package provides the function ‘est.sd’ which estimates the sample standard deviation for a study reporting quantile summary data. This includes 3-point summaries (S\(_1\) or S\(_2\)) and 5-point summaries (S\(_3\)). While the function operates similarly to ‘est.mean’, it incorporates distinct estimation methods specific to standard deviation calculation..

The following methods for estimating the standard deviation are implemented in the function:

The method of Shi et al. (2020) is set as the default estimation option in the function.

For example, to estimate the sample standard deviation using a given 5-point summary, the ‘est.sd’ can be applied by providing the quantiles and the sample size as inputs. We use the same data ‘quants’ used in section 3.1 with the default option ‘shi/wan’ as the estimation method.

#Estimate sample SD of S3 using 'shi/wan' method
estsd_shi <- est.sd(min = quants[1], 
                    q1 = quants[2], 
                    med = quants[3],
                    q3 = quants[4],
                    max = quants[5],
                    n=n)
estsd_shi
## $sd
## [1] 15.34892

Estimation for Two Groups

In addition to the functions ‘est.mean’ and ‘est.sd’, the package also provides two functions ‘est.mean.2g’ and est.sd.2g’ for estimating the sample mean and standard deviation in two-group studies based on quantile summary measures. These functions specifically use the GLD or SLD methods, as the other estimation methods do not support variations for two-group cases.

Particularly, these functions implement the method proposed by De Livera et al. (2024) for two-group cases. The approach uses the Generalized Lambda Distribution (GLD) for 5-number summaries (S\(_3\)), and the Skew Logistic Distribution (SLD) for 3-number summaries (S\(_1\) and S\(_2\)) to estimate sample statistics using quantiles by incorporating shared information across the two groups to improve the accuracy of the estimates.

As a result, these two functions does not require a ‘method’ argument to be specified. However, the functions include additional arguments to input the summary measures for the second group.

For instance, consider the following quantile summaries for two groups. In this case, the ‘rexp’ function from the ‘stats’ package is used to generate example samples with exponential distributions. You may need to load the necessary libraries if they are not already loaded.

#Generate 5-point summary data for two groups
set.seed(123)
n_t <- 100
n_c <- 120
x_t <- rexp(n_t, 5)
x_c <- rexp(n_c, 10)
q_t <- c(min(x_t), quantile(x_t, probs = c(0.25, 0.5, 0.75)), max(x_t))
q_c <- c(min(x_c), quantile(x_c, probs = c(0.25, 0.5, 0.75)), max(x_c))

Similarly to the single group case, the ‘est.mean.2g’ and est.sd.2g’ functions can be applied as below.

#Estimate sample mean of S3 
estmean_2g <- est.mean.2g(q_t[1],q_t[2],q_t[3],q_t[4],q_t[5],
                          q_c[1],q_c[2],q_c[3],q_c[4],q_c[5],
                          n.g1 = n_t,
                          n.g2 = n_c)
estmean_2g
## $mean.g1
## [1] 0.2330208
## 
## $mean.g2
## [1] 0.08490334
#Estimate sample SD of S3 
estsd_2g <- est.sd.2g(q_t[1],q_t[2],q_t[3],q_t[4],q_t[5],
                      q_c[1],q_c[2],q_c[3],q_c[4],q_c[5],
                      n.g1 = n_t,
                      n.g2 = n_c)
estsd_2g
## $sd.g1
## [1] 0.2587828
## 
## $sd.g2
## [1] 0.07264839

When only three number summaries (S\(_1\) and S\(_2\)) are available, the corresponding inputs for the two groups can be used directly.

References

Alysha De Livera, Luke Prendergast, and Udara Kumaranathunga. A novel density-based approach for estimating unknown means, distribution visualisations, and meta-analyses of quantiles. Submitted for Review, 2024. (Article available on request to authors.)

Dehui Luo, Xiang Wan, Jiming Liu, and Tiejun Tong. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical methods in medical research, 27(6):1785–1805,2018.

Sean McGrath, XiaoFei Zhao, Russell Steele, Brett D Thombs, Andrea Benedetti, and DEPRESsion Screening Data (DEPRESSD) Collaboration. Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Statistical methods in medical research, 29(9):2520–2537, 2020.

Jiandong Shi, Dehui Luo, Hong Weng, Xian-Tao Zeng, Lu Lin, Haitao Chu, and Tiejun Tong. Optimally estimating the sample standard deviation from the five-number summary. Research synthesis methods, 11(5):641–654, 2020.

Visualise Densities using Quantiles

The ‘plotdist’ function estimates and visualizes the density curves of one or two groups (samples) using one of the quantile summary scenarios (i.e., either 3-point summaries (S\(_1\) or S\(_2\)) or 5-point summaries (S\(_3\)). It returns a customizable and interactive plotly object visualizing the estimated density curve(s) of individual studies as well as the pooled densities.

Prepare the Dataset

The input data to ‘plotdist’ should be a data frame containing the quantile summary data. For one-group studies, the input data frame can include the following columns:

For two-group studies, the data frame can also contain the following columns for the summary data of the second group: min.g2, q1.g2, med.g2, q3.g2, max.g2 and n.g2.

If only 3-number summaries are available, only the respective columns for the 3-point summary should be included in the data frame.

Visualise Densities of One-Group Studies

For example, consider the following dataset which includes three one-group studies, each reporting 5-point summaries along with their respective sample sizes.

# Dataset of 5-point summaries for 1 group
data_s3 <- data.frame(
  study.index = c("Study1", "Study2", "Study3"),
  min.g1 = c(18, 19, 15),
  q1.g1 = c(66, 71, 69),
  med.g1 = c(73, 82, 81),
  q3.g1 = c(80, 93, 89),
  max.g1 = c(110, 115, 100),
  n.g1 = c(226, 230, 200)
)
data_s3
##   study.index min.g1 q1.g1 med.g1 q3.g1 max.g1 n.g1
## 1      Study1     18    66     73    80    110  226
## 2      Study2     19    71     82    93    115  230
## 3      Study3     15    69     81    89    100  200

Then, using the data above, the desnsity curves of the three stduies can be visualised in the same plot using ‘plotdist’ as illustrated below.

# Plot densities 
plot_s3 <- plotdist(
  data_s3,
  xmin = 10,
  xmax = 125,
  title = "Example Density Plot of S3",
  xlab = "x data",
  title.size = 11,
  lab.size = 10,
  color.g1 = "blue",
  display.index = FALSE,
  display.legend = FALSE
)
plot_s3

Note that the function parameters ‘xmin’ and ‘xmax’ must be specified by the user, where ‘xmin’ is a numeric value for the lower limit of the x-axis for density calculation and ‘xmax’ is a numeric value for its upper limit. To ensure the density curve is fully captured, it is recommended to set ‘xmin’ to a value smaller than the smallest minimum value across studies in the dataset, while setting ‘xmax’ to a value larger than the largest maximum value across the studies. If specific values are not provided for the above parameters, the function itself uses the minimum value of the ‘min.’ columns and maximum value of the ‘max.’ columns, for scenario S\(_1\) or S\(_3\). Note that for scenario S\(_2\) , no default calculation is performed and an error occurs since there are no ‘min.’ and ‘max.’ columns.

Suppose you have a dataset of quantile summaries of 3 studies, each reporting 3-point summaries of S\(_1\) along with their sample sizes.

# Dataset of 3-point summaries for 1 group
data_s1 <- data.frame(
  study.index = c("Study1", "Study2", "Study3"),
  min.g1 = c(18, 19, 15),
  med.g1 = c(73, 82, 81),
  max.g1 = c(110, 115, 100),
  n.g1 = c(226, 230, 200)
)
data_s1
##   study.index min.g1 med.g1 max.g1 n.g1
## 1      Study1     18     73    110  226
## 2      Study2     19     82    115  230
## 3      Study3     15     81    100  200

Note that, in this case, the data frame consists only of the columns representing the minimum, median and maximum values. Then, the desnsity curves of three studies presenting 3-point summaries can be visualised using ‘plotdist’ as illustrated below.

# Plot densities 
plot_s1 <- plotdist(
  data_s1,
  xmin = 10,
  xmax = 125,
  title = "Example Density Plot of S1",
  xlab = "x data",
  color.g1 = "purple",
  display.index = FALSE,
  display.legend = FALSE
)
plot_s1

Visualise Densities of Two-Group Studies

For instance, assume you have the following dataset of two-group studies, each reporting 5-point summaries for both group 1 and group 2 along with their respective sample sizes.

# Dataset of 5-point summaries for 2 groups
data_2g <- data.frame(
  study.index = c("Study1", "Study2", "Study3"),
  min.g1 = c(18, 19, 15),
  q1.g1 = c(66, 71, 69),
  med.g1 = c(73, 82, 81),
  q3.g1 = c(80, 93, 89),
  max.g1 = c(110, 115, 100),
  n.g1 = c(226, 230, 200),
  min.g2 = c(15, 15, 13),
  q1.g2 = c(57, 59, 55),
  med.g2 = c(66, 68, 60),
  q3.g2 = c(74, 72, 69),
  max.g2 = c(108, 101, 100),
  n.g2 = c(201, 223, 198)
)
data_2g
##   study.index min.g1 q1.g1 med.g1 q3.g1 max.g1 n.g1 min.g2 q1.g2 med.g2 q3.g2
## 1      Study1     18    66     73    80    110  226     15    57     66    74
## 2      Study2     19    71     82    93    115  230     15    59     68    72
## 3      Study3     15    69     81    89    100  200     13    55     60    69
##   max.g2 n.g2
## 1    108  201
## 2    101  223
## 3    100  198

Once you input the above data frame into ‘plotdist’ with the appropriate inputs, you can obtain the density curves of two groups, displayed in different colors you provide, within the same plot. Here, we use the default colors defined by the function.

# Plot densities 
plot_2g <- plotdist(
  data_2g,
  xmin = 10,
  xmax = 125,
  title = "Example Density Plot of Two Groups",
  xlab = "x data",
  title.size = 11,
  label.g1 = "Treatment", 
  label.g2 = "Control",
  display.index = FALSE,
  display.legend = TRUE
)
plot_2g

To display the legend labels, you need to provide the names of the groups for the ‘label.g1’ and ‘label.g2’ arguments and set ‘display.legend = TRUE’.

Visualise Pooled Densities

If you need to generate pooled density plots, you can set the ‘pooled.dist’ or ‘pooled.only’ arguments to ‘TRUE’. By default these arguments are set to ‘FALSE’. When ‘pooled.dist = TRUE’, the pooled density curves will be displayed along with the individual density curves and when ‘pooled.only = TRUE’, only the pooled density curves will be plotted, excluding the individual curves.

For example, pooled curves can be added to the previous plot ‘plot_2g’ as follows. Again, the default colors assigned to the two groups are used. You can customize the colors by using the ‘color.g1’, ‘color.g2’, ‘color.g1.pooled’, and ‘color.g2.pooled’ arguments.

# Plot densities with pooled curves
plot_2g <- plotdist(
  data_2g,
  xmin = 10,
  xmax = 125,
  title = "Example Density Plot with Pooled Densities",
  xlab = "x data",
  title.size = 11,
  label.g1 = "Treatment", 
  label.g2 = "Control",
  display.index = FALSE,
  display.legend = FALSE,
  pooled.dist = TRUE
)
plot_2g

Contact and Session Information

Contact Information

For any queries, contact Alysha De Livera a.delivera@latrobe.edu.au or Udara Kumaranathunga u.kumaranathunga@latrobe.edu.au.

Session Information

sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.utf8    
## 
## time zone: Australia/Sydney
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] metaquant_0.1.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5      jsonlite_1.8.9    dplyr_1.1.4       compiler_4.4.1   
##  [5] tidyselect_1.2.1  tidyr_1.3.1       jquerylib_0.1.4   scales_1.3.0     
##  [9] gld_2.6.6         yaml_2.3.10       fastmap_1.2.0     ggplot2_3.5.1    
## [13] R6_2.5.1          labeling_0.4.3    generics_0.1.3    sld_1.0.1        
## [17] knitr_1.48        htmlwidgets_1.6.4 tibble_3.2.1      estmeansd_1.0.1  
## [21] munsell_0.5.1     bslib_0.8.0       pillar_1.9.0      rlang_1.1.4      
## [25] utf8_1.2.4        cachem_1.1.0      xfun_0.49         sass_0.4.9       
## [29] lazyeval_0.2.2    viridisLite_0.4.2 plotly_4.10.4     cli_3.6.3        
## [33] magrittr_2.0.3    crosstalk_1.2.1   class_7.3-22      digest_0.6.37    
## [37] grid_4.4.1        rstudioapi_0.16.0 lifecycle_1.0.4   vctrs_0.6.5      
## [41] data.table_1.16.2 proxy_0.4-27      evaluate_1.0.0    glue_1.7.0       
## [45] fansi_1.0.6       lmom_3.0          e1071_1.7-16      colorspace_2.1-1 
## [49] purrr_1.0.2       httr_1.4.7        rmarkdown_2.29    tools_4.4.1      
## [53] pkgconfig_2.0.3   htmltools_0.5.8.1

References

Alysha De Livera, Luke Prendergast, and Udara Kumaranathunga. A novel density-based approach for estimating unknown means, distribution visualisations, and meta-analyses of quantiles. Submitted for Review, 2024. (Article available on request to authors.)

Marshall Freimer, Georgia Kollia, Govind S. Mudholkar, and C. Thomas Lin. A study of the generalized Tukey lambda family. Communications in Statistics-Theory and Methods, 17(10):3547–3567, 1988.

P. J. van Staden and R. A. R. King. The quantile-based skew logistic distribution. Statistics & Probability Letters, 96:109–116, 2015.

Dehui Luo, Xiang Wan, Jiming Liu, and Tiejun Tong. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical methods in medical research, 27(6):1785–1805,2018.

Sean McGrath, XiaoFei Zhao, Russell Steele, Brett D Thombs, Andrea Benedetti, and DEPRESsion Screening Data (DEPRESSD) Collaboration. Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Statistical methods in medical research, 29(9):2520–2537, 2020.

Jiandong Shi, Dehui Luo, Hong Weng, Xian-Tao Zeng, Lu Lin, Haitao Chu, and Tiejun Tong. Optimally estimating the sample standard deviation from the five-number summary. Research synthesis methods, 11(5):641–654, 2020.